egocentric video
- Law (1.00)
- Information Technology (0.67)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.68)
- Information Technology > Artificial Intelligence > Robots (0.67)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.93)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Asia > China > Guangxi Province > Nanning (0.04)
- Asia > China > Anhui Province > Hefei (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > Italy > Sardinia (0.04)
- Research Report > Promising Solution (0.48)
- Research Report > New Finding (0.46)
A Self Validation Network for Object-Level Human Attention Estimation
Zehua Zhang, Chen Yu, David Crandall
Some recent work [22, 66, 68] has discussed estimating probability maps of ego-attention or predicting gaze points in egocentric videos. However, people think not in terms of points in their field of view, but in terms of theobjects that they are attending to. Of course, the object of interest could be obtained by first estimating the gaze with the gaze estimator and generating object candidates from an off-theshelf object detector, and then picking the object that the estimated gaze falls in. Because this bottom-up approach estimateswhere and what separately, it could be doomed to fail if the eye gaze prediction is slightly inaccurate, such as falling between two objects or in the intersection ofmultiple object bounding boxes (Figure1).
- North America > United States > Indiana (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Asia > South Korea > Gyeonggi-do > Suwon (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Look Ma, No Hands!
The analysis and use of egocentric videos for robotic tasks is made challenging by occlusion due to the hand and the visual mismatch between the human hand and a robot end-effector. In this sense, the human hand presents a nuisance. However, often hands also provide a valuable signal, e.g. the hand pose may suggest what kind of object is being held.
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- (2 more...)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Ego TaskQA: UnderstandingHumanTasksin EgocentricVideos
These questions are dividedintofourtypes,includingdescriptive(whatstatus?),predictive(whatwill?), explanatory (what caused?), and counterfactual (what if?) to provide diagnostic analyses onspatial, temporal, and causalunderstandings ofgoal-oriented tasks. We show an illustrative scenario where two subjects collaborate to makeanddrinkcereal.